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ABSTRACT 

This article examines one of the challenges to the 
integration of classroom and large-scale portfolio assessment, a 
challenge posed by the use of a student’s classroom portfolio for 
large-scale assessment of his or her individual competencies# When 
work is composed with the support of peers, teachers, arid parents, 
whose work is being judged? The validity of inferences drawn from the 
assessment can be compromised unless the question can be answered# 
Experience with portfolio assessment in Vermont and in another study 
conducted by the Center for Research on Evaluation, Standards, and 
Student Testing (California) suggest that the quality of student work 
reflects not only a student's competence, but also the amount and 
quality of support received from others# Procedures that highlight a 
student's contribution to the work must be developed, complicated 
though this will be. Nevertheless, large-scale portfolio assessment 
programs appear to carry significant benefits for instructional 
reform# (Contains 5 tables and 73 references#) (SLD) 



Vc * it it it it is it it it it it it it it it it it it it it it it it it it it it it it it it it i: it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it it 



Reproduct i ons supplied by EDRS are the best that can be made 
from the original ( "icument# 



On 

On 

O 

Os 

cn 

O 

W 



u t OCFAITTWieHT Of tOUCATlOM 
Otic* o< Educ*too.l R#*4..ch and lmp»Oy«mtnl 

EOU^T,ON*.^RE^^URC|S;^ 

O M.oo- Ch.n«.. ».« B**" 

tantfHluCtiOn Quality 



Po.nl. ol .*w « op.n.OTi •t«t«> "' '1“^, 

m»l <Jo not n.c.M»ttly 'W'.MnI oHici.l 
OERI po^MiOn Of polKTy 



“PERhr. SSION 70 REPRODUCE THIS 
MATERIAL BEEN GRANTED BY 



TO THE educational RESOURCES 
INFORMATION CENTER (ERIC). ' 



PORTFOLIO ASSESSMENT: WHOSE WORK IS IT? 

ISSUES IN THE USE OF CLASSROOM ASSIGNMENTS FOR ACCOUNTABILITY 



M aryl Gearhart & Joan L. Herman 



Center for the Study of Evaluation 

National Center for Research on Evaluation, Standards, & Student Testing 
University of California, Los Angeles 



lERiC 



Z. 




Tmoz^'6'^^ 






UCLA^s Center for the Study of Evaluation 8c ^ \ 

The National Center for Research on Evaluatioh, Standards, and Stiide 



Evaluation Comment 



r 



Winter 1995 



Portfolio Assessment; W^osc Work Is It? 

Issues in the Use of Classroom Assignments for Accountability' 



Maiyl Gearhart and Joan L. Herman' 



Center tor the Study of Ewiluation 

National (xnter for Research on Hvaluation, Standards, and Student Testing 
University of California, Los Angeles 



T o many engaged in educational reform, 
portfolio assessment captures a \ision of 
assessment integrated with instruction. 
C'oncerned about the equity and validity of large- 
scale assessment, portfolio ad\ocates argue that 
students' classroom work and their retleetions on 
that work provide a richer and truer picture of 
students' competencies than do traditional or other 
on -demand assessments. Concerned about the 
impact of testing on teaching, ad\dcates point out 
that, as displays of the products of instruction, 
portfolios challenge teachers and students to focus 
on meaningful outcomes. Hurthermore, portfolio 
assessment practices support the assessment of long- 
term projects over time, encourage student- initiated 
revision, and pro\ide a context for presentation, 
guidance, and critique. Ciiven such an ambitious 
agenda for assessment, instruction and accoumabil 
itv, it is no surprise that what is meant hy ‘"portfolio" 
or "■portfolio assessment" \ aries markedh' in prac 
tice and purpose.' 




Shared by most large-scale assessment projects, 
however, is a commitment to bridge the worlds of 
public accountability and classroom practice. 1‘hc 
goal is to give students, teachers, and p< licy makers 
authentic roles in the assessment of stu dents at all 
levels of an accountability system .aid to provide 
data that are appropriate and useful at each lc\el. 
The portfolio spans one Ic'cl of decision making to 
the next, providing detailed evidence at the class 
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room level of the process and outcomes of student 
performance to guide instruction and learning, and 
then supporting more abridged inferences at the 
large-scale level about the qualiw of performance 
and schooling. Integrated with instruction and 
targeted on high standards for student perfor- 
mance., the portfolio is the bridge that supports 
reform of classroom practices on the one side and 
accountabilin^ on the other. The vision is enticing, 
but will it work? Can classroom work be utilized for 
large-scale, high-stakes assessment? 

I n this article, we examine one of the challenges 
to the integration of classroom and large-scale 
portfolio assessment, a challenge posed by the 
use of a student’s classroom portfolio for large-scale 
assessment of his or her individual competencies,^ 
When raters working outside the classroom con- 
text^ arc asked to make judgments about an individual 
student based on a portfolio of work composed with 
tht support of peers, teachers, and parents, whose 
work is bein^ judged? We argue that certain answers 
to this question could threaten the validity of infer- 
ences that can be drawn about indi\adual performance 
from portfolios constructed in the social complexi- 
ties of classroom life. Thus the investigation of 
possible answers to the “whose work” question 
becomes an essential component to the study of the 
validity of portfolio assessment. 



...patterns of relationships among 
on-demand assessments and portfo- 
lio assessments raise questions about 
the validity of test scores. 



Concerns regarding the validity c)f individual 
student scores are already emerging in the fledgling 
technical literature on portfolio assessment. Con- 
sider, for example, findings from two CRESST 



efforts to provide evidence of validiw. In both of 
these studies, patterns of relationships among on- 
demand assessments and portfolio assessments raise 
questions about the validity of test scores. 

• Koretz and his RAND colleagues have been 
evaluating Vermont’s statewide portfolio assess- 
ment program since 1990.^ The Vermont pro- 
gram targets writing and mathematics at Grades 
4 and 8 and includes three components for each 
subject area: year-long student portfolios, “best 
pieces” drawn from the portfolios, and state 
sponsored “uniform” tests which are standard- 
ized but not necessarily multiple-choice. Pat- 
terns of relationships between the results of 
portfolio assessment and uniform tests in both 
subjects were problematic (Koretz, Klein, 
McCaffrey, & Stecher, 1993). While recogniz- 
ing that portfolios and standard assessment may 
well emphasize different aspects of a subject 
domain, the researchers expected correlations 
between the two types of assessments within a 
subject to be stronger than those across subject 
areas. Instead, they found essentially the same 
level of correlation within and across subject 
areas: For example, in writing, writing portfolio 
scores correlated moderately with the standard 
measure of writing and with the portfolio and 
standard measures of mathematics. 

• Gearhart and Herman have conducted tw'o tech- 
nical studies of the ratability of classroom writ- 
ing portfolios. In an initial study, the researchers 
found no relationship between scores for writing 
portfolios and for standard writing assessments: 
IVo-thirds of the students classified as compe- 
tent based on the portfolio score were not so 
classified on the basis of the standani assessment. 
vSimilarly, there was only a weak relationship 
between contrasting procedures for portfolio 
scoring: Half the students classified as compe- 
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tent on the basis of the single portfolio score 
were not so classified when scores for indi\ idual 
pieces were averaged, though correlations be- 
nveen the nvo kinds of portfolio scor s were 
moderately high (in the ,6 range) {Ciearhart, 
Herman, Baker, & Whittaker, 1992, 1993; 
Herman, Gearhart, 8c Baker, 1993). In a sub- 
sequent comparative study of two writing riu 
brics, the researchers found a positi\e relation- 
ship between portfolio scores and standard writ' 
ing assessments for only (Mie of the rubrics 
(Gearhart, Novak, 8c Herman, in press). 

Granted, these researchers were hampered in 
their quest for validation by the paucity of techni- 
cally sound, performance-based criterion measures 
to which portfolio scores could be compared . Nev- 
ertheless, within the constraints set by the current 
state of the art in performance assessment, findings 
like those we have illustrated do raise questions 
about the validity' of portfolio scores as measures of 
individual performance. What factors may have 
contributed to these w'cak relationships betw’een 
portfolio scores and on -demand assessments? No 
doubt there are many, and each will require further 
investigation. 



As summed up by a Vermont teacher 
after rating portfolios for sex'eral days: 
“Whose work is this an\'\\’ay?” 



Consider just two that focus on measurement 
design: The portfolio and on -demand assessments 
may have tapped ditVerent domains of performance 
within a subject area; the on -demand and portfolio 
tasks may have differed in difficulty.^’ The factor 
that we consider in this paper arises from the 
classroom context of portfolio assessment. As 
summed up by a Vermont teacher after rating 
portfolios for several days: 'AVhose work is this 
annwiy?" 



W e begin our discussion by examining the 
ways in wiiich the nature of classroom 
w'ork may undermine the validity of 
"individual" portfolio scores. We illustrate w ith 
("RESST data from both an evaluation of a state- 
wide assessment program, and a laboratory' study of 
the scorability of elementary' writing portfolios. We 
conclude with a discussion of the implications of the 
"whose work" issue for portfolio assessment policy 
and practice. 

Whose Work Is It? An Issue for the Validity' of 
Large-Scale Portfolio Assessment 
The "w’hose work is it?" question arises because 
individual student portfolios are constructed in a 
social context. Portfolios contain the products of 
classroom in.struction, and good classroom instruc- 
tion according to current pedagogical and curriculum 
reforms involves an engaged community' of practi- 
tioners in a supportive learning process (Camp, 
1993; Diischl 8c Gitomer, 1991; Wolf, D.P., 1989; 
Wolf, Bixby, Glenn, 8c Gardner, 1991; Wolf 8c 
Gearhart, 1993a, 1993b), Exemplary instructional 
practice, in short, supports student performance. 
Central to the National Writing Project, for ex- 
ample, is a core instructional model which features 
multiple stages — prew'riting, precomposing, writ- 
ing, sharing, revising, editing and evaluation. Each 
of these stages stands for instructional activities that 
engage a student with resources and with others — 
related readings, classroom discussions, field trips, 
idea webs, small group collaboration, outlining, 
peer review, review and feedback. The sociallv 
contexted character of student writing is seen both 
as a scaffold for students’ writing process and a 
replication of what "rcvil" w riting entails, in that 
writing is often a very social endeavor, (^insider as 
well what is regarded as exemplary portfolio assess 
ment practice. A "portfolio culture" is viewed as 
"replacing. ., the entire envelope ofassessment , . ,w ith 
extended, iterative processes, agreeing that we are 
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interested in what students produce when they are 
given access to models, criticism, and the option to 
revise’’ (Wolf, D. P., 1993, p. 221). Assessment 
opportunities are available at multiple classroom 
moments — in the course of the work that may be 
added to a portfolio, in the construction of the 
portfolio, and in a presentation of the portfolio, 
making collaboration, assessment, and re\ ision con- 
tinual processes within the classroom. 

These visions of an engaged community of learn- 
ers and reviewers have implications for the validin* 
of classroom portfolios for large-scale assessment 
purposes: The more developed the community, the 
more engaged others will be in the work tagged 
with an individual student’s name. While :he locus 
of authorship may shift outward from the individual 
student to the communin' of writers, the shift is 
unlikely to be systematic: Others' contributions to 
students’ work are likely to vary across assignments, 
students, and classrooms. An iiony emerges that 
when the student’s work is more her own, that work 
may index practices and curriculum that certain 
key features of current reforms. 

H owisaratcr unfamiliar with a student or 
the classroom context to assign an indi- 
vidual student a score for a portfolio 
collection that includes assisted or collaborative 
work? Research by Webb (1993) suggests that an 
individual’s performance in the context of group 
activity may or may not represent his or her capa- 
bility. Her finding, for example, that low-ability 
students had higher scores on the basis of group 
work than on individual work suggests that a rater's 
score for a portfolio may overestimate student per- 
formance because it constitutes a rating of etTorts 
that were assisted. Alternatively, the rater who is 
aware that work is assisted ma\ adjust downward 
the individual's score, again biasing the rating." 



Whose Work Is It? Data From CRESST 
Studies 

While question^ regarding the rolesof authorship 
and assisted performance in large-scale portfolio 
assessment have been raised (Condon & Hamp- 
Lyons, 1991; Gitomer, personal communication, 
September, 1994; Herman, et al., 1993; Koretz, 
McCartrey, Klein, Bell, 8c Stecher, 1993; Koretz, 
Stecher, 8c Deibert, 1992; Koretz, personal com- 
munication, September, 1994; Stecher 8c Hamilton, 
1994), they have been neither directly investigated 



These studies suggest substantial 
variability in instructional support for 
students’ work... 



nor widely discussed. As we discuss next, however, 
preliminary results from the CRESST Vermont 
studies (Koretz, Stecher, Klein, 8c McCaffrey, in 
press) and the laboratory'- based studies of portfolio 
ratability (Gearhart et al., 1992; Gearhart, Herman, 
Novak, Wolf, 8c Abedi, 1994; Herman et al., 1993) 
add some empirical basis for concern. These studies 
suggest substantial variability in instructional sup- 
port for students’ work, variability which may well 
compromise the meaning and comparability of scores 
within as well as between classrooms and schools. 

Vermont 

While the RAND evaluation addresses three broad 
issues — the actual implementation of the program 
in schools and classrooms, the program’s diverse 
effects, and the quality of the information yielded by 
the assessment — of interest here are results from a 
sur\'cy distributed to all fourth- and eighth-grade 
math teachers during the second year ( 1992-93) of 
Vermont’s statewide implementation.^ Results arc 
based on the responses of approsimately 52% of the 
mathematics teachers at Grade 4 ( V= 382) and 41% 
at Grade 8 (N - 137) (p. 6). 
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T eachers' responses to a luimher ot ques- 
tions indicated substantial \ ariationin how 
inarhematicsporttblios were implemented 
across classrooms, and consequently substantial 
v ariation in how much help and support students 
received in purtini: their “best face fbnv ard" for the 
portfolio assessment, feachers' reported policies 
on revising best pieces are a first case in point: 
Although more teachers aiconvancd revision of 
most best pieces (S7“i. ), many teachers departed 
from this pattern by either ;r//;/n 7 ;//r revision { 19% K 
simply pcrniittiun it ( 19%), or generally prohihiiinn 
it (5%). Similaiiv, the amount of time students 
spent rev ising varied vv idcly. Fhc average time in 
revision was 30-40 minutes, but in 17% of class- 
rooms students did nor revise at all, and in another 



1 5% of classrooms students took more than one full 
class period to rev isc a best piece. Provision of time 
and support foi rev ision clearly represents an aid to 
performance, and thus students who arc nor en- 
couraged to revise their best pieces may well be at 
a disadvantage relative to those .>tudents who are 
provided greater opportunities to revise. 

fhcrc also was considerable variation in teachers' 
policies regarding who was permitted to assist stu- 
dents in revising their best pieces ( fable 1 ), One in 
four teachers did nor report assisting their own 
students in revisions, and a similar proportion did 
not report permitting students to help each other. 
Seventy percent of fourth -grade teachers and 39% of 
eighth -grade teachers forbade parental or other 
outside assistance. Further complicating these find- 



Table 1 

Assistance Allowed by Teachers on Best Pieces 
{ Percentage of Teachers ) 



Source 




Assistance allow’ed on 
which best pieces.^ 




Rules differ 
for 

individual 

students 


Grade 


None 


Some 


Most 


.•\ii 


Teacher 


4 


27 


23 


14 


16 


21 




8 


27 


32 


9 


13 


19 


Other 


4 


34 


31 


11 


12 


1 


Students 


8 


23 


39 


11 


12 


13 


Parents or 


4 a 


71 


13 


4 


4 


8 


others out 


8 


39 


28 


8 


13 


1 1 


side school 














Ciradc level 


viitVereiue signitieant at 


the 3".. level 




()3» 
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ings regarding classroom variation, roughly 10% ot 
teachers reported that their policies regarding assis- 
tance varied for different students within their 
classrooms. Teachers’ policies also differed with 
respect to acknowledgment of outside help. Only 
about 20% of teachers required their students to 
acknowledge or describe the assistance they re- 
ceived, and, therefore, the raters of most students’ 
portfolios would not know’ w ho contributed to the 
entries or the nature of their assistance. 

F inally, the Vermont teachers reported sub- 
stantially different degrees of influence on 
students’ choices of “best pieces” for their 
portfolios (Table 2): Some teachers reported play- 
ing an equal role w'ith their students in making 
portfolio selections, w'hile others reported no role at 
all. Certainly the type and quality of the w'ork that 
becomes part of a student’s portfolio can be influ- 
enced by who selects the pieces for inclusion. In 
particular, since teachers presumably have a better 



understanding of the scoring criteria than do stu- 
dents, the portfolios of students whose choices w'ere 
assisted by their teachers may be more likely to show' 
students’ capabilities. 

Thus the RAND/CRESST study found sizable 
variations among classrooms in factors such as the 
amount of revision that w'as permitted and the 
extent to which teachers limited assistance from 
others. These implementation findings may help to 
explain the w'eak patterns of relationships between 
portfolio scores and on -demand assessments, rela- 
tionships decribed earlier. If some teachers provide 
(directly or through other adults or students) more 
help than others, comparisons among the portfolio 
scores of their students would be clouded by the 
contributions that others make to a given student’s 
portfolio. Because such factors enter only into 
portfolio scores and not into scores on a standard- 
ized, on-demand assessment, they would tend to 
w'eaken the relationships between portfolio scores 
and scores from on-demand assessments. 



Table 2 

Who Selects Best Pieces? (Percentage of Teachers) 



Who selects best pieces.^ 


Grade 4^ 


Grade 8 


Students on their own 


21 


30 


Students with limited 
teacher input 


55 


57 


Students and teachers 
have equal role 


18 


8 


Teacher w'ith limited 
student input 


5 


3 


Teacher 


1 


1 



^ Grade IcvcM difference igniticant at the 5% level {p<.0S ). 
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CRESST Laborator}' Studies of the Scorability 
of Writing Portfolios 

To document tiie contributions of others to the 
writing contained within students’ writing portfo- 
lios, Gearhart et aL ( 1993 asked teachers t(. rate 
the level of their instructional support for their 
writing assignments. Data were collected in the 
spring of 199 1 from nine teachers spanning Grades 
1 to 6. Hach teacher was asked to designate two 
students at each c>f three levels of writing compe- 
tency (high, medium, and low), to collect complete 
portfolios of all of their work, and for writing 
assignment to document the instructional support 
provided during the composing and editing phases. 



R atings were keyed to the same dimensions 
used at that time to assess students’ writ- 
ing progress ( Baker, Gearhart, & Herman, 
1991): Con ten t/Orjfa n iza tion ( top i c/su b topi c s ca r 
theme, and their structure and development); Style 
(elements of text like descriptive language, word 
choice, sentence choice, tone, mood, voice, and 
audience ); and Mcchnnicsi spelling, grammar, punc- 
tuation, and other conventions). The scale points 
were defined along a continuum from 0 (no sup- 



port) to 3 (teacher has specified the requirement in 
detail). Teachers also w'ere asked to rate each 
assignment in terms <.)[' Copied work (the extent to 
w'fiich the student’s work appeared to be copied 
from peers or from direct modeling by a teacher or 
parent) and to estimate the time the child spent on 
the assignment in hours or fractional parts of hours. 
The dataset consisted of spring 1991 ratings of 228 
a.ssignments from a total of 54 students. The 
number of assignments per student ranged from I 
to 21, with a modal number of 3. (One teacher 
returned 14-21 assignments per target student, 
compared with 1-5 for the remaining eight teach- 
ers4^) Across all assignments, teachers reported 
providing generally low^ to moderate levels of sup- 
port to their target students, but their reported 
support differed substantially among students’ com- 
petency levels: Teachers w^ere far more likely to 
report providing higher levels of support to their 
'Mow” students than to their more able students 
(Table 3), a finding that raises concerns about the 
differential meaning of scores that may be assigned 
to students’ portfolios. 

The patterns of teachers’ reported support dif- 
fered across the three writing dimensions, reflecting. 



Table 3 

Percentage of Teachers Reporting Greater Support 
by Writing Dimension and Student Ability Level 





Stude 


nt ability level 


Writing dimension 


Higli 


Low' 


Content/organi/ation 


.^4 


72 


Style 


13 


55 


Mechanics 


26 


60 



Sotc. A "greaici " level ol suppurl was detiiied as ratings ol 2 or 
3. where 2 indicated some guidelines and feedback, and 3 repre 
seined di tailed guidelines and feedback. 
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it seems, variations in curriculum. In Table 4, we see 
that teachers' experience w ith portfolio assessment 
was related to their patterns of instructional sup- 
port. The tliree teachers who had been using 
portfolios in their classrooms for over a year tended 
to report providing higher levels of support than did 
the six teachers who had just begun experimenting 
w ith portfolio assessment, and we believe that the 
more experienced teachers* engagement with a writ- 
ing process approach contributed to their greater 
involvement with students' assignments (and/or to 
their greater perceptions of involvement). Table 5 
hints at the ways that teachers' reported levels of 
support may be related to grade level as well as 
portfolio experience. While these data are purely an 
illustration from a ver\^ small dataset, w^e see here 
tw'o second-grade teachers providing quite different 
levels of assistance w'ith stx'le vs, mechanics, and two 
fifth -grade teachers differing more in levels of sup- 
port for style. Furthermore, the second- and 
fifth-grade teachers w'ith a year of portfolio experi- 
ence reported emphases on different writing 
dimensions — the second-grade teacher more con- 



cerned with mechanics, the fifth-grade teacher more 
concerned w'ith style. 

Thus teachers in the Gearhart et al, ( 1993) study 
reported variations in instructional practices that 
w'ere likely to have impacted ditferentially the qual- 
ity of student work in the portfolios. As in Vermont, 
these implementation findings may help to explain 
the weak relationships between students' portfolio 
scores and their standard writing assessments. 

Reflections and Recommendations 

Teacher self-report data from two CRESST 
studies have produced evidence of variation in how* 
portfolio w'ork is produced and supported. We 
acknow'ledge the flaws of the prelimiaar>' self-report 
data that w*e have presented and fully recognize that 
further research is needed — studies that employ 
larger sample sizes and multiple methodologies to 
verify the varict)' of support provided to students 
and the impact of such support on assessed perfor- 
mance , But if findings like these can be substantiated 
in more systematic research, they suggest that the 
quality of student w'ork reflects not only a student's 



Table 4 

Comparison of Teachers With Little vs. One Year Experience 
With Portfolios: Percentage of Assignments Given Greater 
Support by Writing Process Dimension 





Dimension 




Portfolio experience 


Focus/organization 


Sn'lc 


Mechanics 


Little («=6) 


54 


41 


36 


One year (n^^) 


92 


82 


74 


Note. A "greater" leve' 


1 of support w as defined as 


ratings o 


f 2 or 3, where 



2 indicated guidelines and feedback, and 3 represented ditnilai guide 
lines and feedback. 
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Table 5 

Illustrative Comparison of Selected Teachers: 
Percentage of Assignments Given Greater Support by 
Writing Process Dimension 







Dimension 




ivaviH .1 s i;iat,u. 

portfolio experience I 


h)cus, organization Style 


Mechanics 


Second grade 

Little experience 
with portfolios 


83 


58 


67 


One year experience 


with portfolios 


05 


18 


100 


Fifth grade 

Little experience 
w ith portfolios 


50 


29 


21 


One year experience 


with portfolios 


72 


72 


44 



\otc. A “greater" level of support was defined as ratings of 2 or 3, where 2 
indicated sonic guidelines and feedback, and 3 represented detailed guide 
lines and feedback. 



competence bui also the amount aiiel epiality of 
support received from others, riuts, whose work is 
the classroom work contained in a student's portfo- 
lio? From the preliminary evidence presented here, 
it seems it may depend — on students' competence 
and a range of variable circumstances: teachers' 
methods of instruction, the nature of their assign- 
ments, peer and other resources a\ailable in the 
classroom, and home support. 

W hat meaning, then, can a large scale 
porfolio assessment program ascribe to 
student work contained in porttblio 
collections? A professional —whether a wittei\ sci 
entisuoreducaiionalresearcher -w ho is accustomed 
to others' input may respond to this question with 
philosophical reflection or an identity crisis. Indeed, 



whose work is this article, for example? In what 
w ays does it reflect the w riting and research compe- 
tencies of either ot its authors? We \ alue our own 
opportunities for collaborative work as much as we 
value the efforts to engage students in authentic 
communities in the classroom, lUit, from a mea- 
s u r e m e n t pe r s pe Cl i \ e , t h e \ a 1 i d i T \’ o^ i n fe re n ces a bo u l 
student conq^etence based solely oii portfolio work 
appears suspect. While this is not a grave concern 
for classroom assessment where teachers and stu 
dents can judge performances with knowledge of 
their eoniexi, the problem is troubling indeed tor 
large scale assessment purposes w hereVomparabil 
ity ot data is an issue, I'nder what circumstai.vcs, 
then, can poi t folio assessments be used to rank or 
make serious decisions about students, teachers, 
schools, or districts? 
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The question requires attention to (a) the pur- 
poses of portfolio assessment, (b) the integrated 
design of portfolio contents, rubric contents, rating 
procedures, and uses of the results, and (c) a recog- 
nition of possible conflicts between the measurement 
and instructional aims of portfolio assessment, d’he 
apparent inverse relationship between support and 
students' ability level in the Gearhart et al. ( 199,s) 
study is a telling example in this regard. Certainly 
if low’ -perform mg students are to achieve high stan- 
dards, it is likely they will need an enriched 
instructional process to give them the capability for 
transferable performance — ample models, coaching 
and mentoring, and multiple opportunities for prac - 
tice, feedback and revision. Rut if the w'ork that 
emerg > from this same instructional process is used 
to assess students' individual performance, then 
there will be problems of comparability of scores 
across students. Can w-e bridge this apparent gulf 
between what is required to serv'e the purposes 
classroom instruction and large-scale accountabil- 
ity.> 



• Restrictions could be imposed on work stu 
dents produce for their portfolios, ^oiurollmg 
who is permitted to provide assistance and 
under w hat circumstances. 'I'hese ['procedures, 
largely rejected as violations of instruct lona! 
freedom, would require verification that tite 
controls on assistance were in place. 

• Portfolios could be “‘seeded" with students* 
responses to a standard perfi >rmance based w nt 
ing assessment; ratings of iliesc- entries might be 
used to adjust overall portfolio scores, or to 
raise ‘‘red flags" when scores tor standard as 
scssments are discrepant with other pontolio 
material. Rut this optu>n would bring addi 
rional complications. 



Portfolio procedures could incorpo 
rate strategies for documenting oth 
ers’ assistance and input... 



I 



I 




W hile no easy solutions come to mind, it 
does appear that any valid assignment of 
an individual student score to a portfo- 
lio for large-scale purposes will require procedures 
to highlight the student’s contribution to the work. 
Adjustments in either composition of the portfolio 
or rating procedures, or both, w ill be necessary' to 
assure comparability of student results. As vve 
outline below', a number of strategies hav'e been 
suggested, although most have so far been rejected 
for reasons of feasibility, cost, or violation of'certain 
fundamental portfolio program assumptions, such 
as instructional freedom, seamless integration oi' 
portfolios w ith instruction, and commit mem to 
honor div'ersity (O. Ciitomer, [nrsonal comnumica 
tion, September, 1994; D. Koret/, personal 
communication, September, 1994; K. Sheingold. 
personal communication, September, 1994), 




First, many performance based assessments ot 
writing and reading currentlv incnr|*K irate com 
ponents of a process approach — (or example, 
shared readings and class discussion, or even 
peer response — and thus the “seeded" assess 
ments might also need checks that the assis 
tance provided vv ascomparal'ile across st udents 
Second, procedures for ad just mg port folio Sc ores 

would require consensus on a tfamevvork for 
justifying those procedures: On what giouiuis 
can a student's individual writing be compared 
against his writing supported bv otheis* ic 
spouses and guidance* 

Portfolio procedures could iiuorpoiute sn aieg'cs 
for documenting otheis' assisiaiue and mpiii 
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• Portfolio entries could be aeeomp.inied with 
con t e \ t d esc I'i pt i ( >ns t h a t d ( )cu m e I u t h e res< ) 1 1 rce s 
that were available to a student and the eontri 
butions of others to the process of’ composinti 
the work. However, this option would require 
procedures for (a) prod ucini^ those descriptions 
(trainini^ teachers and/or students to provide 
comparable information across assi tin ments,stu ■ 
dents, and classrooms), and ib) usinti the con- 
text information when rating the portfolio ma- 
terial (e.ii., will raters first rate the work on its 
ow'n merits, and then adjust the score after 
examininii e\ idence of support? W'ill raters rate 
the work and let someone else rate the support 
and make the adjustment? Will raters look at all 
sources of’ evidence and make some kind of 
integrated, summary judgment of’tbe student's 
individual contribution and competence?). 

• Raters could score a sample of portfolios from a 
given classroom at virtuallv the same time, to 
enable them to see how an indivi dual's work 
compares with or duplicates the responses of 
others from the class, ’fo date, this procedure 
has simply been used as an exercise, but it is of 
interest that the exercise has yielded two differ- 
ent conclusions. In the Wrmont project, re- 
searchers reviewing mathematics portfolios from 
oiiL class noticed identicallv worded key phrases 
in many students' responses to the same assign- 
ment. This suggested that the work had been 
structured so heavily by the teacher that the 
responses did not really represent the perfor- 
mance of individual students (S. Klein and D. 
Koret/., personal communication, September, 
1994). In another project, six target students 
were interviewed in each of scweral portfolio 
classrooms, and, in some classrooms, worrisome 
commonalties in the content of students' writ 
ing reflected impressive commonalties in sui 



dents' imdcrsrajidijif[S of' the work based on 
their interview responses; students had learned 
a great deal from an assignment that emerged 
from intensive classroom collaboration. How 
could we tell the difference, then, between 
eommon understandings and copying? 

rhere is interest in incorporating assessment of 
group process. 

• Portfolios could be constructed to provide evi 
deuce for the ver\' interactive processes that 
endangLi* the validity of individual scores as- 
signed to the fir..d product, 'fhat is, if students 
are using resources, and soliciting and making 
use of'input from others, then it makes sense to 
document and assess students' competeneies 
with these ways of working within a writing 
communin' (cf a current analytic review on 
methods for group assessment by Webb, 1994 ). 

Similarly, students' unique contributions to group 
products could be documented: 

• 4'he design of'portfolio assessment could docu- 
ment more explicitly an individual student's 
role in a given product. The inclusion of 
student self-assessments, peer assessments { when 
the work was collaborative ), and teacher assess- 
ments could help to clarib' a student's unique 
contribution; a follow-up individual assignment 
could demonstrate w hat a stuelent had learned 
from a collaborative project or a project heavily 
guided by the teacher. However, once again, 
little is known about the ways that raters would 
utili/e these inclusions in making a portfolio 
judgment. 
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L isted here to stimulate thinking about pos- 
sible solutions, our collated suggestions 
provide imperfect and somewhat unwieldy 
answers. The alternativer A large-scale, high- 
stakes portfolio program could produce individual 
student scores but do nothing to address the ""whose 
work” problem, thereby ensuring invalid compari- 
sons among students. 

This is not to say that large-scale portfolio assess- 
ment is without value. Our own research and that 
of others indicates that large-scale portfolio assess- 
ment programs carry significant benefits tor 
instructional reform (Gearliart 8c Wolt, 1994; 
Herman 8c Winters, 1994; Koretz, dtecher, Klein, 
McCatfrey, 8c Deibert, 1993; Koretz et al., in press; 
Sheingold, Heller, 8c Pauliikonis, in press). In 
Vermont, for example, principals and teachers re- 



...in the hands of skilled teachers, 
parents, and students, portfolios 
should provide critical contexts for 
discussing, assessing and improving 
the process and the outcomes of stu- 
dents’ learning. 



ported that the portfolio program has induced siz- 
able and diverse changes in instruction that are 
largely consistent with the goals of reform e Torts 
botli in Vermont and nationwide. This is an impor- 
tant consequence even if portfolios fail to provide 
valid individual measurement. But can portfolio 
assessment provide us with valid indices of student 
competencies usable for large-scale accountability? 
There are some promising possibilities: For ex- 

ample, the question of ""whose work” may be less 
troublesome if portfolio assessment results are ag 
gregated for decision making at the school or district 
level, using matrix sampling and excluding indi- 



vidual-level data. As Moss (1992, 1994) argues, 
multilevel designs could assign responsibility for 
individual -level decisions to the local level, where 
portfolio -based decisions about the capability of 
individual students can be informed by professional 
judgment and knowledge of the local school con- 
text. 

S tudents' portfolio work could also provide 
invaluable evidence of students' ""opportu- 
nities to learn.” If some controls over ineq- 
uitable help from sources outside the classroom 
were in place, then portfolio assessment (at any level 
of the system) could provide a window on the 
quality of curriculum and instruction by showing 
what work students are asked to do and how well 
they are able to do it. From this perspective, we 
would expect students’ portfolios to show the best 
of what students can do with help from an effective 
instructional process. 

Finally, in moving forward on large-scale portfo- 
lio assessment, we will need to remember that the 
very complexity of portfolio assessment is at once its 
strength and its weakness. While portfolios may 
resist attempts to reduce their contents to simple, 
reliable individual scores, in the hands of skilled 
teachers, parents, and students, portfolios should 
provide critical contexts for discussing, assessing 
and improving the process and the outcomes of 
students’ learning. |p 

r 
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Wolf, O. V. ( 1 993), Assessment js .in epistidc ot le.irning. 

In R. Bennett & W. W.ird ( Hds. ), Constniction vs, choice 
in cojjHttivc measuremnniy^p, 2 1 3-240). Hillsdale, NJ: 
Lawrence Erlbaum Associ^ites. 

Wolf, O. r„ Bi.\by, I., Glenn, L, & Gardner, H. ( 1991 ). 
To use their minds well: Iinestig.iting new fciniis ot 
student assessment. In G. Cirant (Ld.l, Revinv of 
Research iu Edneation {W)\. 17, pp. 31*74). Washing- 
ton, nC: American Educational Research Association. 

Wolf, S. A., &: Gearhart, M. ( 1993a). Writiuji what you 
read: Asscssmeut as a leaniiujj eveut \CSl\ Tech, Rep. 
358). Los Angeles: University of C:alifornia, Center for 
Research on Evaluation, Standards, and Student Test* 
ing. 

Wolf, S, A., &: Gearhan, M. ( 1993b). Wntiuji what you 
read. Ajjuidebook for the asscssmeut of children 's nan'a- 
tives (CSE Resource I'aper No. 10). Los Angeles: 
University of California, Center for Research on Evalu* 
ation. Standards, and Student Testing, 

Yancey, K, B. (1992). Portfolios in the writing classroom: 
A final reflection. In R. B. Yancey ( Ed. ), Portfolios in the 
writing classroom: An introduction (pp. 102-116). 

Urbana, IL: NCTE. 

^ Roth authors contributed equally to this article. Portions 
of this article are closelv adapted from Gearhart, Herman, 
Baker, & Whittaker (1993). We thank Noreen Webb, 
Dan Ktiretz, Brian Steelier, Drew Gitonier, and Ron 
Dietel for detailed comments and revisions of an earlier 
draft. 

2 Examples include: Baker, Gearhart, Herman, Tierney, 8: 
Whittaker, 1991; BelanofV & Dickson, 1991; Calfee 8: 
Pertunio, 1992, 1993a, 1993b; Camp, 1990, 1993;Canip 
& Levine, 1991; Ereednian, 1993; Gentile, 1992; Gill, 
1993;Gitonier, 1993;Gla/er, Brown, Fantau//o, Nugent, 
8: Searfoss, 1993; H amp Lyons 8: Condon, 1993; 
Herman, Aschb,acher, 8: Winters, 1992; Herman, Gear- 
hart, 8: .'\schbaclicr,in press; Hewitt. 199 1 , 1993;Hiebert 
8: C:alfee, 1992; Hill, 1992; Howard, 1990; Roretz, 
1 993 ; Roretz, Rlein, McC!a flVey, 8: Steelier, 1 99 3 ; Roretz, 
McCaffrey, Rlein, Bell, 8: Steelier, 1993; Roretz, Steelier, 
8c Deihcrt, 1992; Roretz, Steelier, Rlein, 8: McC!a0rey,iii 
press; Roretz, Steelier, Rlein, McC-atlrey, 8c Deibert, 
1993; I.eMahieu, Eresh, 8: Wallace, 1992; I.cMahieii, 
Gitonier, 8: Eresh, in press; Meyer, Sehunian, 8: Angello, 
1990; Mills, 1989; Mills 8: Brewer, 1988; Moss, 1992, 
Moss, Beck, Ebbs, Herter, Matson, Mueliniorc, Steele, 8c 
Taylor, 1991; Murphy 8c Smith, 1990, 1992; O'Neil, 
1992; Paulson, Paulson, 8: Meyer, 1991; Reidv, 1992; 




Savior 8c Overton, 1993; Sheingold 1994; Sininions, 

1990; Smith 8c Murphy, 1992; Steelier 8c Hamilton, 
1994; Stroble, 1993; I'ierney, 1992; Tierney, Charter, 8c 
Desai, 1991; Wilcncia 8c (!.ilfee, 199 1 ; \erniont Depart- 
ment of Education, 1990, 1991a, 1991b, 1991c, 199U1; 
WolfD.P., 1989, 1993; WolfBixby, Glenn, 8c Gardner, 
1991; Yancey, 1992. 

^ Moss (1994) has proposed that high -stakes, large-scale ^ 
writing assessment could be conducted in w ays that afford 
greater local authorin’ and that foster dialogue and ongo- 
ing review of both the methods of portfolio assessment 
and the results. Her work is the impetus for ver\' produc- 
tive debate concerning a etmiplex .set of issues, including 
the \v,iys that the goals and models of portfolio assessment 
programs may need to be designed differently for different 
levels of an assessment system. In this paper, we reit>' the 
features of a “large-scale portfolio assessment program," 
both simplih'ing their complexity and drawing on features 
of several current programs. 

In the Rentucky program, the classroom teacher scores his 
or her ow n portfolios, but a sampling of portfolios is 
rcscored by another te,u her ( usually from another schcK>l) 
and/or by a RRIS staff member. Because a given teacher's 
scores may be adjusted by patterns of relationships be- 
tween his scores and an outsider's, w e still view the issue 
for Kentucky in the same way: (Tedibilip' for a student's 
score still derives ultimately from an outsider's judgment. 

” Roretz, 1993; Roretz. Rlein, McCaffrey, 8c Steelier, 1993; 
Roretz, McCaffre\ , Rlein, Bell, 8c Steelier, 1993; Roretz, 

S tec her, 8c Deibert, 1992; Rtiretz, Steelier, Rlein, 8c 
McCaffrey, 1994, in press; Rtiretz, Steelier, Rlein, 
McCaffrey, 8c Deibert, 1993. 

^ There are ccnainly other possible explanatitiiis. Moss 
( 1994), for example, discusses poteiilial inconiparibilities 
among methods and purposes of different approaches to 
assessment. 

^ Again we are assuming a rater who is ‘^coring outside the 
classroom context. 

^ 'file description below is taken from Roretz et al., 1994. 
Readers ,.e referred to that teehiiieal report for more 
details about the study. 

The description below is taken from Gearhart et al., 1993. 
Readers are referred to that technical report for more 
details about the study. 

4\> compensate for \.iri ation in the number of .issign 
nicnts rated per child and the number of children desig- 
nated as high, medium, or low in w ming.ibilitv, weighted 
averages were computed for each teacher. I'liiis, a teacher's 
ratings were as eraged for each student's assignnients, and 
then, as appropriate, a “mean of means" w. is computed for 
each teacher, or for “high," “mediuni," and “low" stu 
dents fo»* each teacher. 
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On Concept Maps as Potential ‘‘Authentic" 
Assessments in Science 

R i chn ; -d Slja i udsoji . H ca thcr La and Ih id^u't Lcwiji 
eSH Technical Report 388. 1994 {$4.00) 

Usi)iq aniccpt maps as a >i asscssmnit tool will itrjjc 
cditcators to tt'ach stiidnits })ioyc tha}i siniplc facts 
and concepts, but how diffcroit coJiccpts relate to 
each other. A)i evalitational tool such as cojicept 
}}iappi)u} un^esthe individual to thijik o)i a deeper 
eoj}}iitivc level thaii a 'fill-in the blank "test would 
require. Ihcre is value in both a.\'sessnieut tools- 
neithcr should be iciuored. 

Fourth- grade teacher 

T he preceding statement from On Concept 
Ma ps a s Po tential “ A // then ti c ” A ssessm eu ts 
in Science points to the multiple expected 
benefits from the use of concept maps as alternative 
assessments. In this report, CRESST researcliers 
Richard Shave Ison, Heather Lang, and Bridget 
Lewin examine concept mapping issues as an assess- 
ment technique exploring questions related to validity 
and reliability of student scores, fhe authors begin 
w ith a clear definition of concept mapping: 

“A concept map," write the authors, “constructed 
by a student, is a graph consisting of nodes repre- 
senting concepts and labeled lines denoting the 
relation between a pair of nodes (concepts)." 

Rased on an extensi\e review’ of concept map 
usage, the authors found that concept mapping 
techniques differed widelv. 

“No less than 128 possible variations were iden- 
tified" say the authors. “Methods for scoring maps 
varied almost as w'idely, from the admonition ‘don't 
score maps' to a detailed scoring svstem for hierar- 
chical maps." 

The researchers' review led to the conclusion that 
an integrative “working" cognitive theory is needed 
to begin to limit this great variation for alternative 
assessment purposes. “Such a tlieory," conclude 
the authors, “would also serve as a basis for much 



needed psychometric studies of the reliability and 
construct validity ofconcept maps since such studies 
are almost nonexistent in the literature." 

The rcN’icw presents issues arising from large-scale 
use of mapping techniques, including the impor- 
tance of students' skills in using concept maps, and 
the possible negative imp.ict of teachers teaching to 
the assessment it students ha\ e to memorize 
maps pros ided by textbooks or themsehes. 

Specifications for the Design of Problem-Solv- 
ing Assessments in Science 
Brenda Sujtrue 

CSE Technical Report 387, 1994 ($4.00) 

I n Specifieationsfor the Desijjn of Problem -Solv- 
injj Assessments in Science., C'RHSS f researcher 
Rrenda Sugrue draws on the C'RESS'f perfor- 
mance assessment model to develop a new' set oftest 
specifications for science. Sugrue recommends that 
designers follow a straightfonvard approach for 
developing alternative science asse.ssments. 

“Carr\- out an analysis of the subject matter con- 
tent to be assessed," says Sugrue, “identiffing kev 
concept .*>, principles, and procedures that are cm 
bodied in the content." She adds that much of this 
analysis already exists hi state frameworks or in the 
national science stancirds. 

Either multiple-choice, open-ended, or hands-on 
science tasks can then be created or adapted to 
measure individual constructs, such as conceptsand 
principles, and the links between concepts and 
principles. 

In addition to measuring content related con- 
structs, Sugrue's model advocates measuring 
metacognitive constructs and motivationalconstructs 
in the context of the content. This permits more 
specific identification of the sources of students' 
poor performance. Students may perform poorlv 
because of deficiencies in content knowledge, and/ 
or deficiencies in construets sueh as planning ami 
monitoring, and/or maladaptive perceptions of self 
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and task. The more specific the diagnosis ot the 
source of poor performance, the more specific can 
be instructional inter\’cntions to improve perfor- 
mance. 

Sugruc’s model includes specifications for task 
design, task development, and task scoring, all 
linked to specific components of problem solving 
abilitv. .\n upcoming CRESST report will.dijcuss 
the results of a study designed to evaluate the 
ctfectiveness of the model for attributing variance in 
pertbrmance to particular components of problem 
solving and particular formats for measuring them. 

Group Collaboration in Assessment: Compet- 
ing Objectives, Processes, and Outcomes 
Noreen Webb 

CSE Technical Report 386, 1994 ($4.00) 

L earning from other students, developing 
interpersonal skills, and maximizing col- 
laborative performance are three primarx' 
goals of small-group collaboration. But according 
to Noreen Webb in Group Collaboration in Assess^ 
menu Competing Objectives, Processes, and Outcomes, 
group assessment may or may not be an etfective 
strategy^ for measuring such goals. 

^Thcrc may be an appropriate place for collabo- 
rative group work in educational assessment,” asserts 
Webb. “Most importantly,” she adds, “how group 
work is used in an assessment should coincide with 
the purpose of an assessment.” 

Webb's own research and that of others strongly 
suggests that the purposes must be clearly under- 
stood from the start of a group assessment project. 
Measuring individual student learning versus group 
productivity, for example, may call for differences in 
the assessments 

If the purpose is to measure individual achieve- 
ment, suggests Webb, then the instructions might 
be w'orded to encourage individual effort. A Con- 
nccticut assessment, for example, told students that 
“each person should be able to explain fully the 



conclusions reached by the group” and should be 
prepared to give an oral presentation on their group's 
experiment. Thus, students expected that they 
would be held accountable individually. 

But if the assessment purpose is different, so too 
should be the focus, says Webb. She cites group 
productivity as one example: 

“Assessment focusing on group productivity,” 
says Webb, “would give a group a task to complete, 
and evaluation would focus on the completed task, 
not on individual students' contributions to com- 
pleting the task.” 

Webb savs that teachers need to prepare students 
for group assessment. Small-group collaboration 
can help students to develop valuable communica- 
tions skills and give them a better “understanding of 
what kinds of group processes help them learn as 
individuals and what kinds of processes help maxi- 
mize group productivity,” she concludes. 

The Evolution of a Portfolio Program: The 
Impact and Quality of the Vermont Program in 
Its Second Year 

Daniel Koretz, Brian Steeber, Stephen Klein, and 
Daniel McCaffrey 

CSE Technical Report 385, 1994 ($4.00) 

P art of an ongoing evaluation of the Ver- 
mont portfolio assessment program by 
RAND/CRESST researchers, this report 
presents recent analyses of the reliability of Ver- 
mont portfolio scores, and the results of school 
principal inter\*iews and teacher questionnaires. 

The message, especially from Vermont teachers, 
say the researchers, remains mixed. Math teachers, 
for example, have modified their curricula and teach- 
ing practices to emphasize problem solving and 
mathematical communication skills, but many feel 
they arc doing so at the expense of other areas of the 
curriculum. About one-half of the teachers report 
that student learning has improved, hut an equal 
number feel that there has been no change. Addi- 



New & Recent CRESST/CSE Technical Reports 



19 



tionally, teachers reported great variation in the 
implementation of portfolios into their classrooms, 
including the amount oF assistance provided to 
students. 

“One in four teachers," found the authors, “does 
not assist his or her own students in revisions, and 
a similar proportion docs not permit students to 
help each other. Seventy percent of fourth-grade 
teachers and 39% of eighth-grade teachers k)rbid 
parental or other outside assistance." 

Consequently, students who receive more sup- 
port from teachers, parents and other students may 
have a significant advantage over students who 
receive little or no outside help. 

Reliability problems continue. “The degree of 
agreement," write the authors, “among Vermont’s 
portfolio raters was much lower than among raters 
in studies with other t\*pes of constructed response 
measures." 

The authors suggest that one cause of the low 
reliabiliw was the diversity' of tasks w'ithin each 
portfolio. Because teachers and students are free to 
select their own pieces, performance on the tasks is 
much more ditTicult to assess than if the work were 
standardized. 

Despite these problem areas, support for the 
portfolio program remains high. Teachers, for ex- 
ample, expressed strong support for expanding 
portfolios to all grade levels. Seventy percent of 
principals said that their schools had extended port- 
folio usage beyond the original Vermont state 
mandate. 

A Conceptual Framework for Analyzing the 
Costs of Alternative Assessment 
Lawrence (X Ficus 

CSE Technical Report 384, 1994 (S4.00) 

D espite the fact that many states are invest 
ing millions ofdollars in the development 
of alternative assessments, little is known 
about the actual costs of such assessments. In A 



Conceptual Framework for Analyzing the Costs of 
Alternative Assessments, CRESST partner Lawrence 
(). Picus analyzes many of the issues related to 
identih'ing the costs of new assessments including 
the relationship berween costs and goals. 

“If, as is often the case in education," says Picus, 
“there are multiple goals established for an alterna- 
tive assessment program, then estimation of the 
costs of that program must include all of the re- 
sources necessary' to accomplish all of those goals." 

To identity' alternative assessment costs, Picus 
suggests the use of a three-dimensional model com- 
prised of /t'pr/jofcxpenditures, ^rwrfjof expenditures, 
and expenditure components. Levels of expenditures 
are the source of expense such as national, state, 
district, school, classroom, or private market levels. 
Kinds of expenditures include personnel, materials, 
supplies, and travel. Components include assess- 
ment development, production, training, scoring, 
reporting and program evaluation. 

“The largest single expenditure item in any assess- 
ment progi-am,’' concludes Picus, “seems likely to 
be personnel." 

Opportunity costs must also he considered, adds 
Picus. Resources committed to creating an alterna- 
tive assessment program are resources used to support 
a former testing program or resources that could be 
spent on other programs, such as bilingual educa- 
tion. 

The framework developed in A Conceptual Frame- 
work fo r A nalyzi n^ the Costs of A Ite rn a tive Asscssm en t 
addresses both opportunity costs and assessment 
costs matched to goals. 
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Economic Analysis of Testing: Competency, Cer- 
tification, and “Authentic” Assessments 
Jamt'S S. Cattcrall and Lynn Winters 
CSE Technical Report 383, 1994 (S4.00) 

C ost- benefit and cost-etfectiveness analy- 
ses, say researchers James C\\tterall and 
I .vn n \ Vi n t e r s i n ;/ ;// / V A ; / a lysis of Test - 

inff: Competency's Certification^ and Authentic 
AssessmentSs liave similar policy purposes. 

“Both analyses," note the authors, “aim at what 
choices might be made either to reach given goals 
with lower costs, or to attain more results for a given 
budget allocation." 

But tr\'ing to use cither economic analysis is 
difficult when applied to educational assessment 
because anticipated benefits are moot. The authors 
note, for example, that policy makers, test coordi- 
nators, principals, and counselors have used minimum 
competency tests to motivate students, encourag- 
ing them, albeit negatively, to develop and improve 
basic skills. Yet a study by James Cattcrall in 1990 
showed that fewer than half of 736 students in eight 
high schools ( four states) were even aware that their 
high school required them to pass a minimum 
‘ ompetency test prior to graduation. Clearly the 
intended motivational benefit or effect sought by 
poliev makers was not reflected by what was truly 
hii|./pening. Thus, tying the costs of assessments to 
benefits and eflects that may not really occur is 
problematic. 

Regarding performance assessments, the authors 
suggest that linking policy and costs will be equally 
challenging. Because performance assessments should 
be good instructional activities themselves, it is 
difficult to differentiate the costs into specific cat- 
egories such as assessment, curriculum, or, in cases 
of scoring, professional development for teachers. 

“That they [performance assessments! have re- 
turns over and above current tests is presently 
assumed,” conclude Catterall and Winters. “Estab- 




lishing the linkages between the costs and benefits 
may be an important factor in the course of testing 
reform in the 1990s." 

Analysis of Cognitive Demand in Selected Alter- 
native Science Assessments 
Gail Baxters Robert Glaser, and Kalyani Rajdnivan 
C"SE Technical Report 382, 1994 (S4.00) 

W orking with pilot science assessments in 
California and Connecticut, the research- 
er s i n Ana lysis oj ' Co^ n i ti i 'c De m and i n 
Selected Alternative Science Assessments focused on 
cognitive activiw required for successful completion 
of performance assessment tasks. Of special interest 
w'as the degree to which task performance reflected 
differences in student understanding. 

“We focused,” wrote Baxter, Glaser, and Raghavan, 
“on the extent to which: (a) tasks allowed students 
the opportuniw to engage in higher order thinking 
skills and (b) scoring systems reflected differential 
performance of students with respect to the nature 
of cognitive activity' in which they engaged.” 

Data came from three types of science assessment 
tasks-explorator\' investigation, conceptual integra- 
tion, and component identification-each vary'ing 
with respect to grade level, prior knowledge, stage 
of development and purpose. Analyses of the data 
resulted in some important recommendations for 
the development of assessment tasks and scoring. 

In general, wrote the authors, “tasks should: (a) 
be procedurally open-ended affording students an 
opportunity to display their understanding; (b) 
draw on subject matter knowledge as opposed to 
knowledge of generally familiar facts; and (c) be 
cognitively rich enough to require thinking.” 

The authors concluded that scoring systems should: 
(a) link score criteria to task expectations; (b) be 
sensitive to the meaningful use of knowledge; and 
(c) capture the problem solving processes the sui 
dents engage in. 
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Measurement- Driven Reform: Research on Policy, 

Practice, Repercussion 

Audrey ]. Xoble and Mary Lee Smith 

eSH Technical Report .^81. 19^4 {S4.00) 

D cmonslraiinua top-down cducalionai re 
form strategy and a hclieflhai assessment 
can leverage educational chanjic, the Ari- 
zona state legislature in 1990 passed Arizona Rc\ iscd 
StatiUe 15-741. I'he Icjiislation resulted in the Ari- 
zona' udent Assessment Proiiiam i ASAP i,a program 
that incorporated both standardized and perfor- 
mance -based assessments. Mcasiiremcnt-Drivoi 
Reform: Research oti Policy. Practice. Reperenssiou 
reports on liow ASAP was conceived, negotiated, 
and implemented. The CRHSST researchers con- 
ductint; the study, Audrey Noble and Mary Lee 
Smith, were critical oft he policy process tbuit created 
ASAP. 

ASAP "reveals both the ambiguities characteristic 
of the [assessment! policy-making process," write 
Noble and Smith, "and the dysfunctional side effects 
that e\()I\ e from the p ‘licy's disparities." 

HmpUning multiple research methods, the re 
searchers inteniewed members oft he policy- shaping 
community and examined documents and artifacts 
of the testing policy, fheir analysis determined that 
competing ideas about student learning, teachers, 
curriculum, and assessment resulted in inePective 
implementation of* the assessment program, l ive 
inconsistencies were reported: 

• Poliev makers' definitions of ‘icarning" were 
incoherent; 

• PolicN- makers held dissonant expectations of' 
teachers; 

• Policy makers clashed regarding the role of 
curriculum; 

• PolicN makers alleged that a single perti^rmance 
asse.ssmcnt could fulfill the dual purposes of 
instructional improvement and accountability; 



• riie implementation plan of the Arizona Stu- 
dent Assessment Program was a dysfunctional 
side effect of a policy built on contradictory 
ideals. 

"Although ASAP appeals to many because of its 
ambiguity," conclude Noble and Smith, "this same 
characteristic may undermine its capacity to effect 
anv substantial change in educational practice." 

What Happens When the Test Mandate Changes? 
Results of a Multiple Case Study 
Marx Lee Smith. A it drey J. Xohle. Marilyn Cabay. 
Walt Heiuecke. M. Susan Junkir. and Tvonne Saf- 
fron 

CSK Technical Report 580, 1994 ^54. SO) 

T he Arizona Student Assessment Program 
(ASAP) was designed to solve what policy 
makers perceived to be the state's most 
pressing educ,:tional problems: moving schools 

tow ard the state curriculum framework and making 
schools more accountable for student achievement. 
However, as findings from this multiple case study 
demonstrate, the actions of practitioners were tar 
from uniform in response to this policy mandate. 

In this report, ( RHSSl researcher Maty Lee 
Smith and colleagues outline the results to date of 
a three -year, qualitative study of sc bool reactions to 
the ASAP mandate. One of a series of reports on a 
larger project, UV;/Tr Li a p pens When the lest Man- 
date ('.hanaesL the present study addresses the 
consequences of the change mandate in four Ari 
zona clementarv schools during the first year of 
implementation. 

LNing a case studv’ methodology, the re.searchers 
focused on the interplay of poliev and practice by 
engaging directly in the local, school site scene. 
This particular approach allowed them to gain .v 
cess tc» participant meanings and to show how 
meanings in action evolved over time. 
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Results from this study indicated that local school 
responses to the policy mandate varied substantially. 
The goal of transforming classroom instructional 
practices was achieved in only one school that had 
adopted such practices prior to the state mandate. 

‘‘Local interpretations and organizational norms 
interv'ened to color, distort, delay, enhance, or 
thwart the intentions of the policy and the policy- 
shaping community, " concluded the authors. 
Expectations for school reform based on mandates 
must consider the vast disparities that exist between 
individual schools and teachers. 

Assessment, Testing, and Instruction: Retro- 
spect and Prospect 
Robert Glaser and Edward Silver 
CSE Technical Report 379, 1994 ($4.50) 

I ncreasing concern about the nature and form 
of student assessments and the uses made of 
test results forms the basis for Assessinent, 
Testing and Instruction: Retrospect and Prospect by 
CRESST researchers Robert Glaser and Edward 
Silver. The authors explore the nature of testing and 
assessment by examining some of the deficiencies 
and abuses associated with past practices in educa- 
tional measurement, then i .ivestigating present and 
future possibilities for alternative forms of assess- 
ment. 

“At this point in time.'’ write Glaser and Silver, 
“assessment and testing in .\merican schools are 
caught between the extensive rhetoric of reform and 
the intransigence of long-established practices," 
Through an informative discussion of the two stan 
dard purposes of educational assessment — testing 
for selection and placement and assessing educa- 
tional outcomes — the authors demonstrate the need 
for an evaluation of the purposes of educational 
testing. 



MORE TECHNICAL REPORTS 
Policy Makers’ Views of Student Assessment 
Lorraine McDonnell 

CSE Technical Report 378, 1994 (S4.00) 

Engaging Teachers in Assessment of Their Stu- 
dents’ Narrative Writing: Impact on Teachers’ 
Knowledge and Practice 

Maryl Gearhart Shelby A. Wolf, Bette Bnrkcy, and 

Andrea K. Whittaker 

CSE Technical Report 377, 1994 (S4.00) 

Test Theory Reconceived 
Robert J. Mislevy 

CSE Technical Report 376, 1994 ($4.50) 

Linking Statewide Tests to the National Assess- 
ment of Educational Progress: Stability of 

Results 

Robert L. Linn and Vonda L. Kiplinjjer 
CSE Technical Report 375, 1994 ($4.00) 
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DATABASES 

Alternative Assessments in Practice Database 
listings trom o\ cr 250 developers of new assess- 
ments, iMacintosh version only, 1993 ($15.00) 

VIDEOTAPES 
Assessing the Whole Child 
1 8 -min Lite video program includes new pi actitit)ner's 
guidebook, V3, 1994 ($15,00) 

Portfolio Assessment and High Technolog)' 

10 minute \*ideo program includes practitioner's 
guidebook on portfolio assessment, \'2, 1992 
($12.50) 

HANDBOOK 

CRESST Performance Assessment Models: As- 
sessing Content Area Explanations 
Eva L. BakeVy Pamela R. Aschbacba\ David Niemu 
and Edynn Sato 

The CRESST Handbook, 1992 ($10,00) 



RESOURCE PAPER 

Writing What You Read: A Guidebook for the 
Assessment of Children’s Narratives 
Shelby Wolf and Maryl Gearhart 
CSE Resource Paper 10, 1993 ($4.00) 

Eor a complete list of all CRESST products, please 
contact Kim Hurst at SlO-206-l S32. e-mail: 
ki m @cse. it cl a, ed //, or U CLA C !en ter for the Stu dy o f 
EvalitatioUy lOSSO Wilshire Bird,. Los Anyicles. 
CA 90024- 139L 



FOl.D .AND SHCCRK 



Place 

Postage 

Here 



UCLA CxMitcr for the Study of Evaluation 
10880 Wilshirc Blvd., Suite 700 
Los Angeles, ('alifornia 90024-1394 
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Order Form 

Au.kK .uMiiion.i! sheet it’ more room is needed. Form is pre uldressed on leserse 
CSE/CRESST Products 



C'SK Number Title 



Number ot C'opies Priic per ( »>p\ lotal I'tue 



POSTAGE & HANDLING 
(Special 4th Class Uook Ratcl 



Subtotal of SO to SIO 
$10 to S20 
$20 to S50 
over $50 



add S1.50 
add S2.50 
add $3.50 
add 10% of Subtotal 



ORDER SUBTOTAL _ 
POSTAGE &: HANDLING t scale at left ) _ 
Calift)rnia residents add 8.25% _ 

TOTAL 



Orders of less than SI 0.00 must he prepaid 

Your name 8c mailing address — please print or type: 

d Pavment enclosed Q Please bill me 

d I would like to recene free ct>ptes t>t the 
CRESST and Evaluation (. 'ommcfir 
publications. 



UCLA Center for the Study of Evaluation 
10880 VVilshire Blvd., Suite 700 
Los Angeles, California 90024-1394 



ADDRESS CORRHC'riON REQUESTED (ED 03) 



NONPROITT ORCi 
ILS. roSTACd 
PAID 
IL( .L A. 



26 



Dr. Carol Ascher 
Senior Research Assoc. 

Institute for Urban i Minority Ed. 
Columbia University Teachers Colleoe 
P.O. Box 40 



